Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 84038 |
| Missing cells | 16885 |
| Missing cells (%) | 1.3% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 10.3 MiB |
| Average record size in memory | 128.0 B |
Variable types
| NUM | 9 |
|---|---|
| CAT | 7 |
geneSymbol has a high cardinality: 9703 distinct values | High cardinality |
diseaseId has a high cardinality: 11181 distinct values | High cardinality |
diseaseName has a high cardinality: 11181 distinct values | High cardinality |
diseaseClass has a high cardinality: 755 distinct values | High cardinality |
source has a high cardinality: 51 distinct values | High cardinality |
DPI is highly correlated with DSI | High correlation |
DSI is highly correlated with DPI | High correlation |
diseaseClass has 3637 (4.3%) missing values | Missing |
EI has 4383 (5.2%) missing values | Missing |
YearInitial has 4383 (5.2%) missing values | Missing |
YearFinal has 4383 (5.2%) missing values | Missing |
NofSnps is highly skewed (γ1 = 89.28782133) | Skewed |
NofPmids has 7131 (8.5%) zeros | Zeros |
NofSnps has 75760 (90.1%) zeros | Zeros |
Reproduction
| Analysis started | 2020-11-18 22:52:29.404232 |
|---|---|
| Analysis finished | 2020-11-18 22:53:13.162224 |
| Duration | 43.76 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
geneId
Real number (ℝ≥0)
| Distinct | 9703 |
|---|---|
| Distinct (%) | 11.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 719625.0238 |
|---|---|
| Minimum | 1 |
| Maximum | 109580095 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 434 |
| Q1 | 2662 |
| median | 5468 |
| Q3 | 10457 |
| 95-th percentile | 220972 |
| Maximum | 109580095 |
| Range | 109580094 |
| Interquartile range (IQR) | 7795 |
Descriptive statistics
| Standard deviation | 8284285.301 |
|---|---|
| Coefficient of variation (CV) | 11.51194723 |
| Kurtosis | 142.0155591 |
| Mean | 719625.0238 |
| Median Absolute Deviation (MAD) | 3412 |
| Skewness | 11.99847004 |
| Sum | 6.047584775e+10 |
| Variance | 6.862938294e+13 |
| Monotocity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 7124 | 340 | 0.4% | |
| 6648 | 285 | 0.3% | |
| 3569 | 270 | 0.3% | |
| 5443 | 241 | 0.3% | |
| 5743 | 239 | 0.3% | |
| 7157 | 232 | 0.3% | |
| 3553 | 231 | 0.3% | |
| 4524 | 192 | 0.2% | |
| 4843 | 184 | 0.2% | |
| 5728 | 182 | 0.2% | |
| Other values (9693) | 81642 | 97.1% |
| Value | Count | Frequency (%) | |
| 1 | 2 | < 0.1% | |
| 2 | 26 | < 0.1% | |
| 9 | 14 | < 0.1% | |
| 10 | 39 | < 0.1% | |
| 12 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 109580095 | 5 | < 0.1% | |
| 107305681 | 1 | < 0.1% | |
| 107075310 | 1 | < 0.1% | |
| 106783499 | 1 | < 0.1% | |
| 106481323 | 2 | < 0.1% |
| Distinct | 9703 |
|---|---|
| Distinct (%) | 11.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 656.5 KiB |
| TNF | 340 |
|---|---|
| SOD2 | 285 |
| IL6 | 270 |
| POMC | 241 |
| PTGS2 | 239 |
| Other values (9698) |
| Value | Count | Frequency (%) | |
| TNF | 340 | 0.4% | |
| SOD2 | 285 | 0.3% | |
| IL6 | 270 | 0.3% | |
| POMC | 241 | 0.3% | |
| PTGS2 | 239 | 0.3% | |
| TP53 | 232 | 0.3% | |
| IL1B | 231 | 0.3% | |
| MTHFR | 192 | 0.2% | |
| NOS2 | 184 | 0.2% | |
| PTEN | 182 | 0.2% | |
| Other values (9693) | 81642 | 97.1% |
Frequencies of value counts
Unique
| Unique | 2161 ? |
|---|---|
| Unique (%) | 2.6% |
Histogram of lengths of the category
Length
| Max length | 12 |
|---|---|
| Median length | 5 |
| Mean length | 4.83180228 |
| Min length | 2 |
| Distinct | 320 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 40 |
| Missing (%) | < 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5388791281 |
|---|---|
| Minimum | 0.231 |
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 0.231 |
|---|---|
| 5-th percentile | 0.324 |
| Q1 | 0.445 |
| median | 0.533 |
| Q3 | 0.623 |
| 95-th percentile | 0.78 |
| Maximum | 1 |
| Range | 0.769 |
| Interquartile range (IQR) | 0.178 |
Descriptive statistics
| Standard deviation | 0.1341011444 |
|---|---|
| Coefficient of variation (CV) | 0.248851992 |
| Kurtosis | 0.1746056657 |
| Mean | 0.5388791281 |
| Median Absolute Deviation (MAD) | 0.088 |
| Skewness | 0.3534637234 |
| Sum | 45264.769 |
| Variance | 0.01798311694 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0.445 | 756 | 0.9% | |
| 0.529 | 727 | 0.9% | |
| 0.379 | 646 | 0.8% | |
| 0.821 | 611 | 0.7% | |
| 0.628 | 597 | 0.7% | |
| 0.659 | 583 | 0.7% | |
| 0.674 | 578 | 0.7% | |
| 0.7 | 575 | 0.7% | |
| 0.619 | 574 | 0.7% | |
| 0.653 | 573 | 0.7% | |
| Other values (310) | 77778 | 92.6% |
| Value | Count | Frequency (%) | |
| 0.231 | 340 | 0.4% | |
| 0.236 | 232 | 0.3% | |
| 0.248 | 270 | 0.3% | |
| 0.266 | 167 | 0.2% | |
| 0.276 | 231 | 0.3% |
| Value | Count | Frequency (%) | |
| 1 | 284 | 0.3% | |
| 0.931 | 376 | 0.4% | |
| 0.89 | 430 | 0.5% | |
| 0.861 | 458 | 0.5% | |
| 0.839 | 550 | 0.7% |
| Distinct | 25 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 59 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6934339299 |
|---|---|
| Minimum | 0.038 |
| Maximum | 0.962 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 0.038 |
|---|---|
| 5-th percentile | 0.231 |
| Q1 | 0.577 |
| median | 0.769 |
| Q3 | 0.846 |
| 95-th percentile | 0.923 |
| Maximum | 0.962 |
| Range | 0.924 |
| Interquartile range (IQR) | 0.269 |
Descriptive statistics
| Standard deviation | 0.2123399344 |
|---|---|
| Coefficient of variation (CV) | 0.3062150917 |
| Kurtosis | 0.3762738081 |
| Mean | 0.6934339299 |
| Median Absolute Deviation (MAD) | 0.116 |
| Skewness | -1.05059721 |
| Sum | 58233.888 |
| Variance | 0.04508824776 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=25)
| Value | Count | Frequency (%) | |
| 0.808 | 9643 | 11.5% | |
| 0.885 | 9087 | 10.8% | |
| 0.846 | 8748 | 10.4% | |
| 0.769 | 6862 | 8.2% | |
| 0.731 | 5823 | 6.9% | |
| 0.923 | 5675 | 6.8% | |
| 0.692 | 4921 | 5.9% | |
| 0.654 | 3982 | 4.7% | |
| 0.962 | 3410 | 4.1% | |
| 0.615 | 3378 | 4.0% | |
| Other values (15) | 22450 | 26.7% |
| Value | Count | Frequency (%) | |
| 0.038 | 139 | 0.2% | |
| 0.077 | 733 | 0.9% | |
| 0.115 | 790 | 0.9% | |
| 0.154 | 800 | 1.0% | |
| 0.192 | 1008 | 1.2% |
| Value | Count | Frequency (%) | |
| 0.962 | 3410 | 4.1% | |
| 0.923 | 5675 | 6.8% | |
| 0.885 | 9087 | 10.8% | |
| 0.846 | 8748 | 10.4% | |
| 0.808 | 9643 | 11.5% |
| Distinct | 11181 |
|---|---|
| Distinct (%) | 13.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 656.5 KiB |
| C0006142 | 1074 |
|---|---|
| C0036341 | 883 |
| C0023893 | 774 |
| C0009402 | 702 |
| C0033578 | 616 |
| Other values (11176) |
| Value | Count | Frequency (%) | |
| C0006142 | 1074 | 1.3% | |
| C0036341 | 883 | 1.1% | |
| C0023893 | 774 | 0.9% | |
| C0009402 | 702 | 0.8% | |
| C0033578 | 616 | 0.7% | |
| C0376358 | 616 | 0.7% | |
| C0678222 | 538 | 0.6% | |
| C1458155 | 527 | 0.6% | |
| C4704874 | 525 | 0.6% | |
| C1257931 | 525 | 0.6% | |
| Other values (11171) | 77258 | 91.9% |
Frequencies of value counts
Unique
| Unique | 6427 ? |
|---|---|
| Unique (%) | 7.6% |
Histogram of lengths of the category
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
| Distinct | 11181 |
|---|---|
| Distinct (%) | 13.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 656.5 KiB |
| Malignant neoplasm of breast | 1074 |
|---|---|
| Schizophrenia | 883 |
| Liver Cirrhosis, Experimental | 774 |
| Colorectal Carcinoma | 702 |
| Prostatic Neoplasms | 616 |
| Other values (11176) |
| Value | Count | Frequency (%) | |
| Malignant neoplasm of breast | 1074 | 1.3% | |
| Schizophrenia | 883 | 1.1% | |
| Liver Cirrhosis, Experimental | 774 | 0.9% | |
| Colorectal Carcinoma | 702 | 0.8% | |
| Prostatic Neoplasms | 616 | 0.7% | |
| Malignant neoplasm of prostate | 616 | 0.7% | |
| Breast Carcinoma | 538 | 0.6% | |
| Mammary Neoplasms | 527 | 0.6% | |
| Mammary Neoplasms, Human | 525 | 0.6% | |
| Mammary Carcinoma, Human | 525 | 0.6% | |
| Other values (11171) | 77258 | 91.9% |
Frequencies of value counts
Unique
| Unique | 6427 ? |
|---|---|
| Unique (%) | 7.6% |
Histogram of lengths of the category
Length
| Max length | 177 |
|---|---|
| Median length | 23 |
| Mean length | 24.18607059 |
| Min length | 4 |
diseaseType
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 656.5 KiB |
| disease | |
|---|---|
| phenotype | |
| group |
| Value | Count | Frequency (%) | |
| disease | 60478 | 72.0% | |
| phenotype | 13653 | 16.2% | |
| group | 9907 | 11.8% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 9 |
|---|---|
| Median length | 7 |
| Mean length | 7.089150146 |
| Min length | 5 |
| Distinct | 755 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 3637 |
| Missing (%) | 4.3% |
| Memory size | 656.5 KiB |
| C04 | |
|---|---|
| C23;C10 | 5434 |
| C06;C04 | 4201 |
| F03 | 4177 |
| C04;C17 | 3546 |
| Other values (750) |
| Value | Count | Frequency (%) | |
| C04 | 6576 | 7.8% | |
| C23;C10 | 5434 | 6.5% | |
| C06;C04 | 4201 | 5.0% | |
| F03 | 4177 | 5.0% | |
| C04;C17 | 3546 | 4.2% | |
| C25;F03 | 2623 | 3.1% | |
| C14 | 2546 | 3.0% | |
| C10 | 2464 | 2.9% | |
| C06;C25 | 2445 | 2.9% | |
| C23 | 2269 | 2.7% | |
| Other values (745) | 44120 | 52.5% | |
| (Missing) | 3637 | 4.3% |
Frequencies of value counts
Unique
| Unique | 223 ? |
|---|---|
| Unique (%) | 0.3% |
Histogram of lengths of the category
Length
| Max length | 43 |
|---|---|
| Median length | 7 |
| Mean length | 6.534210714 |
| Min length | 3 |
diseaseSemanticType
Categorical
| Distinct | 28 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 656.5 KiB |
| Disease or Syndrome | |
|---|---|
| Neoplastic Process | |
| Mental or Behavioral Dysfunction | |
| Pathologic Function | 3383 |
| Sign or Symptom | 3035 |
| Other values (23) |
| Value | Count | Frequency (%) | |
| Disease or Syndrome | 36204 | 43.1% | |
| Neoplastic Process | 22380 | 26.6% | |
| Mental or Behavioral Dysfunction | 9078 | 10.8% | |
| Pathologic Function | 3383 | 4.0% | |
| Sign or Symptom | 3035 | 3.6% | |
| Finding | 2833 | 3.4% | |
| Congenital Abnormality | 2791 | 3.3% | |
| Experimental Model of Disease | 1309 | 1.6% | |
| Injury or Poisoning | 1198 | 1.4% | |
| Neoplastic Process; Experimental Model of Disease | 694 | 0.8% | |
| Other values (18) | 1133 | 1.3% |
Frequencies of value counts
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | < 0.1% |
Histogram of lengths of the category
Length
| Max length | 51 |
|---|---|
| Median length | 19 |
| Mean length | 20.20069492 |
| Min length | 7 |
score
Real number (ℝ≥0)
| Distinct | 70 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.3570343178 |
|---|---|
| Minimum | 0.3 |
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 0.3 |
|---|---|
| 5-th percentile | 0.3 |
| Q1 | 0.3 |
| median | 0.3 |
| Q3 | 0.34 |
| 95-th percentile | 0.7 |
| Maximum | 1 |
| Range | 0.7 |
| Interquartile range (IQR) | 0.04 |
Descriptive statistics
| Standard deviation | 0.1231498229 |
|---|---|
| Coefficient of variation (CV) | 0.3449243301 |
| Kurtosis | 7.601567973 |
| Mean | 0.3570343178 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.744105646 |
| Sum | 30004.45 |
| Variance | 0.01516587888 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0.3 | 51663 | 61.5% | |
| 0.31 | 6240 | 7.4% | |
| 0.4 | 5635 | 6.7% | |
| 0.32 | 2802 | 3.3% | |
| 0.5 | 2051 | 2.4% | |
| 0.7 | 1876 | 2.2% | |
| 0.6 | 1800 | 2.1% | |
| 0.33 | 1766 | 2.1% | |
| 0.34 | 1126 | 1.3% | |
| 0.35 | 808 | 1.0% | |
| Other values (60) | 8271 | 9.8% |
| Value | Count | Frequency (%) | |
| 0.3 | 51663 | 61.5% | |
| 0.31 | 6240 | 7.4% | |
| 0.32 | 2802 | 3.3% | |
| 0.33 | 1766 | 2.1% | |
| 0.34 | 1126 | 1.3% |
| Value | Count | Frequency (%) | |
| 1 | 380 | 0.5% | |
| 0.99 | 10 | < 0.1% | |
| 0.98 | 15 | < 0.1% | |
| 0.97 | 15 | < 0.1% | |
| 0.96 | 18 | < 0.1% |
| Distinct | 187 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 4383 |
| Missing (%) | 5.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9915316176 |
|---|---|
| Minimum | 0 |
| Maximum | 1 |
| Zeros | 45 |
| Zeros (%) | 0.1% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.967 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.05104972624 |
|---|---|
| Coefficient of variation (CV) | 0.05148572706 |
| Kurtosis | 130.6567067 |
| Mean | 0.9915316176 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -9.996256584 |
| Sum | 78980.451 |
| Variance | 0.002606074549 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 74279 | 88.4% | |
| 0.5 | 279 | 0.3% | |
| 0.8 | 259 | 0.3% | |
| 0.667 | 220 | 0.3% | |
| 0.75 | 216 | 0.3% | |
| 0.857 | 148 | 0.2% | |
| 0.833 | 135 | 0.2% | |
| 0.875 | 121 | 0.1% | |
| 0.909 | 115 | 0.1% | |
| 0.9 | 107 | 0.1% | |
| Other values (177) | 3776 | 4.5% | |
| (Missing) | 4383 | 5.2% |
| Value | Count | Frequency (%) | |
| 0 | 45 | 0.1% | |
| 0.2 | 2 | < 0.1% | |
| 0.25 | 5 | < 0.1% | |
| 0.333 | 21 | < 0.1% | |
| 0.4 | 11 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 74279 | 88.4% | |
| 0.997 | 3 | < 0.1% | |
| 0.996 | 2 | < 0.1% | |
| 0.995 | 5 | < 0.1% | |
| 0.994 | 21 | < 0.1% |
| Distinct | 73 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 4383 |
| Missing (%) | 5.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2007.110878 |
|---|---|
| Minimum | 1924 |
| Maximum | 2020 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 1924 |
|---|---|
| 5-th percentile | 1993 |
| Q1 | 2003 |
| median | 2008 |
| Q3 | 2013 |
| 95-th percentile | 2018 |
| Maximum | 2020 |
| Range | 96 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 7.905874962 |
|---|---|
| Coefficient of variation (CV) | 0.003938932845 |
| Kurtosis | 2.99199418 |
| Mean | 2007.110878 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -1.265467877 |
| Sum | 159876417 |
| Variance | 62.50285892 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2011 | 5793 | 6.9% | |
| 2010 | 5643 | 6.7% | |
| 2008 | 4572 | 5.4% | |
| 2009 | 4560 | 5.4% | |
| 2006 | 3939 | 4.7% | |
| 2014 | 3897 | 4.6% | |
| 2007 | 3866 | 4.6% | |
| 2005 | 3631 | 4.3% | |
| 2012 | 3457 | 4.1% | |
| 2013 | 3388 | 4.0% | |
| Other values (63) | 36909 | 43.9% | |
| (Missing) | 4383 | 5.2% |
| Value | Count | Frequency (%) | |
| 1924 | 1 | < 0.1% | |
| 1940 | 2 | < 0.1% | |
| 1944 | 1 | < 0.1% | |
| 1947 | 1 | < 0.1% | |
| 1951 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2020 | 50 | 0.1% | |
| 2019 | 1133 | 1.3% | |
| 2018 | 2957 | 3.5% | |
| 2017 | 2899 | 3.4% | |
| 2016 | 2528 | 3.0% |
| Distinct | 56 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 4383 |
| Missing (%) | 5.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2011.928215 |
|---|---|
| Minimum | 1962 |
| Maximum | 2020 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 1962 |
|---|---|
| 5-th percentile | 2001 |
| Q1 | 2008 |
| median | 2013 |
| Q3 | 2017 |
| 95-th percentile | 2019 |
| Maximum | 2020 |
| Range | 58 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 6.496091576 |
|---|---|
| Coefficient of variation (CV) | 0.003228788943 |
| Kurtosis | 3.364900169 |
| Mean | 2011.928215 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -1.330282612 |
| Sum | 160260142 |
| Variance | 42.19920576 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2019 | 9348 | 11.1% | |
| 2018 | 7132 | 8.5% | |
| 2017 | 5717 | 6.8% | |
| 2011 | 5371 | 6.4% | |
| 2010 | 4985 | 5.9% | |
| 2014 | 4418 | 5.3% | |
| 2015 | 4221 | 5.0% | |
| 2016 | 4074 | 4.8% | |
| 2009 | 3877 | 4.6% | |
| 2008 | 3690 | 4.4% | |
| Other values (46) | 26822 | 31.9% | |
| (Missing) | 4383 | 5.2% |
| Value | Count | Frequency (%) | |
| 1962 | 4 | < 0.1% | |
| 1966 | 1 | < 0.1% | |
| 1967 | 1 | < 0.1% | |
| 1968 | 2 | < 0.1% | |
| 1969 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2020 | 2807 | 3.3% | |
| 2019 | 9348 | 11.1% | |
| 2018 | 7132 | 8.5% | |
| 2017 | 5717 | 6.8% | |
| 2016 | 4074 | 4.8% |
| Distinct | 69 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.488017325 |
|---|---|
| Minimum | 0 |
| Maximum | 136 |
| Zeros | 7131 |
| Zeros (%) | 8.5% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 4 |
| Maximum | 136 |
| Range | 136 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.430120834 |
|---|---|
| Coefficient of variation (CV) | 1.633126706 |
| Kurtosis | 339.3925622 |
| Mean | 1.488017325 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 13.29523115 |
| Sum | 125050 |
| Variance | 5.905487267 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 61273 | 72.9% | |
| 2 | 7638 | 9.1% | |
| 0 | 7131 | 8.5% | |
| 3 | 2765 | 3.3% | |
| 4 | 1543 | 1.8% | |
| 5 | 1166 | 1.4% | |
| 6 | 650 | 0.8% | |
| 7 | 358 | 0.4% | |
| 8 | 294 | 0.3% | |
| 9 | 209 | 0.2% | |
| Other values (59) | 1011 | 1.2% |
| Value | Count | Frequency (%) | |
| 0 | 7131 | 8.5% | |
| 1 | 61273 | 72.9% | |
| 2 | 7638 | 9.1% | |
| 3 | 2765 | 3.3% | |
| 4 | 1543 | 1.8% |
| Value | Count | Frequency (%) | |
| 136 | 1 | < 0.1% | |
| 91 | 2 | < 0.1% | |
| 84 | 1 | < 0.1% | |
| 77 | 1 | < 0.1% | |
| 74 | 2 | < 0.1% |
| Distinct | 189 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.073323972 |
|---|---|
| Minimum | 0 |
| Maximum | 2632 |
| Zeros | 75760 |
| Zeros (%) | 90.1% |
| Memory size | 656.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 2632 |
| Range | 2632 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 17.06011828 |
|---|---|
| Coefficient of variation (CV) | 15.89465876 |
| Kurtosis | 11343.82071 |
| Mean | 1.073323972 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 89.28782133 |
| Sum | 90200 |
| Variance | 291.0476359 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 75760 | 90.1% | |
| 1 | 2580 | 3.1% | |
| 2 | 1193 | 1.4% | |
| 3 | 796 | 0.9% | |
| 4 | 569 | 0.7% | |
| 5 | 406 | 0.5% | |
| 6 | 358 | 0.4% | |
| 7 | 273 | 0.3% | |
| 8 | 205 | 0.2% | |
| 9 | 203 | 0.2% | |
| Other values (179) | 1695 | 2.0% |
| Value | Count | Frequency (%) | |
| 0 | 75760 | 90.1% | |
| 1 | 2580 | 3.1% | |
| 2 | 1193 | 1.4% | |
| 3 | 796 | 0.9% | |
| 4 | 569 | 0.7% |
| Value | Count | Frequency (%) | |
| 2632 | 1 | < 0.1% | |
| 2258 | 1 | < 0.1% | |
| 1252 | 1 | < 0.1% | |
| 1160 | 1 | < 0.1% | |
| 991 | 1 | < 0.1% |
| Distinct | 51 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 656.5 KiB |
| CTD_human | |
|---|---|
| GENOMICS_ENGLAND | 5408 |
| PSYGENET | 3159 |
| ORPHANET | 3133 |
| UNIPROT | 1651 |
| Other values (46) |
| Value | Count | Frequency (%) | |
| CTD_human | 61485 | 73.2% | |
| GENOMICS_ENGLAND | 5408 | 6.4% | |
| PSYGENET | 3159 | 3.8% | |
| ORPHANET | 3133 | 3.7% | |
| UNIPROT | 1651 | 2.0% | |
| CTD_human;GENOMICS_ENGLAND;UNIPROT | 1411 | 1.7% | |
| CTD_human;GENOMICS_ENGLAND;ORPHANET;UNIPROT | 1182 | 1.4% | |
| CGI | 1084 | 1.3% | |
| CTD_human;GENOMICS_ENGLAND | 979 | 1.2% | |
| CLINGEN | 734 | 0.9% | |
| Other values (41) | 3812 | 4.5% |
Frequencies of value counts
Unique
| Unique | 5 ? |
|---|---|
| Unique (%) | < 0.1% |
Histogram of lengths of the category
Length
| Max length | 55 |
|---|---|
| Median length | 9 |
| Mean length | 11.07481139 |
| Min length | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| geneId | geneSymbol | DSI | DPI | diseaseId | diseaseName | diseaseType | diseaseClass | diseaseSemanticType | score | EI | YearInitial | YearFinal | NofPmids | NofSnps | source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | A1BG | 0.700 | 0.538 | C0019209 | Hepatomegaly | phenotype | C23;C06 | Finding | 0.30 | 1.000 | 2017.0 | 2017.0 | 1 | 0 | CTD_human |
| 1 | 1 | A1BG | 0.700 | 0.538 | C0036341 | Schizophrenia | disease | F03 | Mental or Behavioral Dysfunction | 0.30 | 1.000 | 2015.0 | 2015.0 | 1 | 0 | CTD_human |
| 2 | 2 | A2M | 0.529 | 0.769 | C0002395 | Alzheimer's Disease | disease | C10;F03 | Disease or Syndrome | 0.50 | 0.769 | 1998.0 | 2018.0 | 3 | 0 | CTD_human |
| 3 | 2 | A2M | 0.529 | 0.769 | C0007102 | Malignant tumor of colon | disease | C06;C04 | Neoplastic Process | 0.31 | 1.000 | 2004.0 | 2019.0 | 1 | 0 | CTD_human |
| 4 | 2 | A2M | 0.529 | 0.769 | C0009375 | Colonic Neoplasms | group | C06;C04 | Neoplastic Process | 0.30 | 1.000 | 2004.0 | 2004.0 | 1 | 0 | CTD_human |
| 5 | 2 | A2M | 0.529 | 0.769 | C0011265 | Presenile dementia | disease | C10;F03 | Mental or Behavioral Dysfunction | 0.30 | 1.000 | 1998.0 | 2004.0 | 3 | 0 | CTD_human |
| 6 | 2 | A2M | 0.529 | 0.769 | C0011570 | Mental Depression | disease | F01 | Mental or Behavioral Dysfunction | 0.30 | 1.000 | 1987.0 | 2000.0 | 2 | 0 | PSYGENET |
| 7 | 2 | A2M | 0.529 | 0.769 | C0011581 | Depressive disorder | disease | F03 | Mental or Behavioral Dysfunction | 0.30 | 1.000 | 1987.0 | 2000.0 | 2 | 0 | PSYGENET |
| 8 | 2 | A2M | 0.529 | 0.769 | C0019202 | Hepatolenticular Degeneration | disease | C16;C06;C18;C10 | Disease or Syndrome | 0.30 | 1.000 | 2013.0 | 2013.0 | 1 | 0 | CTD_human |
| 9 | 2 | A2M | 0.529 | 0.769 | C0022660 | Kidney Failure, Acute | disease | C13;C12 | Disease or Syndrome | 0.30 | 1.000 | 2013.0 | 2013.0 | 1 | 0 | CTD_human |
Last rows
| geneId | geneSymbol | DSI | DPI | diseaseId | diseaseName | diseaseType | diseaseClass | diseaseSemanticType | score | EI | YearInitial | YearFinal | NofPmids | NofSnps | source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 84028 | 106481323 | RNU6-456P | 0.931 | 0.077 | C2931456 | Prostate cancer, familial | disease | C04;C12 | Neoplastic Process | 0.30 | 1.0 | 2018.0 | 2018.0 | 1 | 0 | CTD_human |
| 84029 | 106481323 | RNU6-456P | 0.931 | 0.077 | C4722327 | PROSTATE CANCER, HEREDITARY, 1 | disease | C04;C12 | Neoplastic Process | 0.30 | 1.0 | 2018.0 | 2018.0 | 1 | 0 | CTD_human |
| 84030 | 106783499 | OPA8 | 0.839 | 0.231 | C4085249 | OPTIC ATROPHY 8 | disease | NaN | Disease or Syndrome | 0.30 | NaN | NaN | NaN | 0 | 0 | GENOMICS_ENGLAND |
| 84031 | 107075310 | MTCO2P12 | 0.368 | 0.962 | C0268237 | Cytochrome-c Oxidase Deficiency | disease | C16;C18 | Disease or Syndrome; Congenital Abnormality | 0.33 | 1.0 | 1999.0 | 2011.0 | 0 | 0 | GENOMICS_ENGLAND |
| 84032 | 107305681 | DHS6S1 | 1.000 | 0.077 | C0730294 | North Carolina macular dystrophy | disease | C16;C11 | Disease or Syndrome | 0.50 | 1.0 | 2016.0 | 2016.0 | 1 | 0 | CTD_human;ORPHANET |
| 84033 | 109580095 | HBB-LCR | 0.743 | 0.115 | C0002875 | Cooley's anemia | disease | C16;C15 | Disease or Syndrome | 0.30 | NaN | NaN | NaN | 0 | 0 | CTD_human |
| 84034 | 109580095 | HBB-LCR | 0.743 | 0.115 | C0005283 | beta Thalassemia | disease | C16;C15 | Disease or Syndrome | 0.30 | NaN | NaN | NaN | 0 | 0 | CTD_human |
| 84035 | 109580095 | HBB-LCR | 0.743 | 0.115 | C0019025 | Hemoglobin F Disease | disease | C16;C15 | Disease or Syndrome | 0.30 | NaN | NaN | NaN | 0 | 0 | CTD_human |
| 84036 | 109580095 | HBB-LCR | 0.743 | 0.115 | C0085578 | Thalassemia Minor | disease | C16;C15 | Disease or Syndrome | 0.30 | NaN | NaN | NaN | 0 | 0 | CTD_human |
| 84037 | 109580095 | HBB-LCR | 0.743 | 0.115 | C0271979 | Thalassemia Intermedia | disease | C16;C15 | Disease or Syndrome | 0.30 | NaN | NaN | NaN | 0 | 0 | CTD_human |